-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make changes to try to prevent memory bugs in parcel resolver #9885
Conversation
…er/parcel into pyamada/safe-resolver-final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll definitely want to clean this up ASAP once we've tested it.
Can you use cargo flags for this instead of forking the whole crate? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Criterion benchmark results HTML file
These results were taken on a EC2 c7i.8xlarge
instance. These vary quite a bit as the test-case is very fast. We should set-up more complex test cases with more files per iteration, to also be benchmarking the caches and so on.
The best nºs we'll get will be from production, when the flag is rolled-out.
Benchmarking FileSystem - check for non existent file using stat (exists): Collecting 100 samples in estimated 5.0021 s (9.3M iteraFileSystem - check for non existent file using stat (exists)
time: [531.77 ns 533.04 ns 534.47 ns]
change: [-1.1872% -0.9784% -0.7506%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) low mild
1 (1.00%) high mild
7 (7.00%) high severe
Benchmarking FileSystem - check for non existent file open (read_to_string): Collecting 100 samples in estimated 5.0074 s (3.1M iteFileSystem - check for non existent file open (read_to_string)
time: [1.5756 µs 1.5821 µs 1.5890 µs]
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) high mild
7 (7.00%) high severe
Run safe resolver simple OsFileSystem
time: [53.090 µs 53.399 µs 53.688 µs]
change: [-1.2921% -0.8435% -0.3762%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) low severe
5 (5.00%) high mild
4 (4.00%) high severe
Run safe resolver modules OsFileSystem
time: [78.842 µs 80.227 µs 81.887 µs]
change: [-2.4456% -1.4033% -0.2646%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 21 outliers among 100 measurements (21.00%)
14 (14.00%) low mild
1 (1.00%) high mild
6 (6.00%) high severe
Run unsafe resolver simple OsFileSystem
time: [48.232 µs 48.455 µs 48.658 µs]
change: [-1.9564% -1.4375% -0.9269%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
7 (7.00%) low severe
2 (2.00%) low mild
3 (3.00%) high severe
Run unsafe resolver modules OsFileSystem
time: [73.793 µs 74.305 µs 74.923 µs]
change: [+2.6275% +3.9922% +5.5641%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
Benchmarking Run safe resolver simple - PreloadingFileSystem - No IO: Collecting 100 samples in estimated 5.0030 s (126k iterationsRun safe resolver simple - PreloadingFileSystem - No IO
time: [23.365 µs 23.547 µs 23.760 µs]
change: [-0.8148% -0.0801% +0.6690%] (p = 0.84 > 0.05)
No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
6 (6.00%) high mild
6 (6.00%) high severe
Benchmarking Run safe resolver modules - PreloadingFileSystem - No IO: Collecting 100 samples in estimated 5.0041 s (106k iterationRun safe resolver modules - PreloadingFileSystem - No IO
time: [29.443 µs 29.469 µs 29.498 µs]
change: [-2.1236% -1.6713% -1.2740%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low severe
3 (3.00%) high mild
1 (1.00%) high severe
Benchmarking Run unsafe resolver simple - PreloadingFileSystem - No IO: Collecting 100 samples in estimated 5.1314 s (146k iteratioRun unsafe resolver simple - PreloadingFileSystem - No IO
time: [18.784 µs 18.800 µs 18.820 µs]
change: [-0.2920% -0.1053% +0.0995%] (p = 0.30 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
4 (4.00%) high mild
3 (3.00%) high severe
Benchmarking Run unsafe resolver modules - PreloadingFileSystem - No IO: Collecting 100 samples in estimated 5.0471 s (121k iteratiRun unsafe resolver modules - PreloadingFileSystem - No IO
time: [25.081 µs 25.370 µs 25.792 µs]
change: [+0.6886% +1.4525% +2.2694%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 17 outliers among 100 measurements (17.00%)
9 (9.00%) high mild
8 (8.00%) high severe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are Apple M1 Pro results:
FileSystem - check for non existent file using stat (exists)
time: [674.93 ns 689.44 ns 712.65 ns]
Found 10 outliers among 100 measurements (10.00%)
9 (9.00%) high mild
1 (1.00%) high severe
FileSystem - check for non existent file open (read_to_string)
time: [1.2387 µs 1.2618 µs 1.2946 µs]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
Run safe resolver simple OsFileSystem
time: [161.04 µs 163.25 µs 166.15 µs]
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
Run safe resolver modules OsFileSystem
time: [244.64 µs 249.86 µs 256.41 µs]
Found 10 outliers among 100 measurements (10.00%)
8 (8.00%) high mild
2 (2.00%) high severe
Run unsafe resolver simple OsFileSystem
time: [158.88 µs 161.76 µs 164.91 µs]
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe
Run unsafe resolver modules OsFileSystem
time: [233.01 µs 237.07 µs 243.17 µs]
Found 14 outliers among 100 measurements (14.00%)
1 (1.00%) low severe
4 (4.00%) high mild
9 (9.00%) high severe
Run safe resolver simple - PreloadingFileSystem - No IO
time: [18.144 µs 18.418 µs 18.842 µs]
Found 17 outliers among 100 measurements (17.00%)
6 (6.00%) high mild
11 (11.00%) high severe
Run safe resolver modules - PreloadingFileSystem - No IO
time: [24.043 µs 24.276 µs 24.489 µs]
Found 19 outliers among 100 measurements (19.00%)
8 (8.00%) high mild
11 (11.00%) high severe
Run unsafe resolver simple - PreloadingFileSystem - No IO
time: [13.251 µs 13.273 µs 13.298 µs]
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
1 (1.00%) high mild
10 (10.00%) high severe
Run unsafe resolver modules - PreloadingFileSystem - No IO
time: [18.525 µs 18.591 µs 18.672 µs]
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
pub invalidate_on_file_create: DashSet<FileCreateInvalidation>, | ||
pub invalidate_on_file_change: DashSet<PathBuf>, | ||
pub invalidate_on_file_create: RwLock<HashSet<FileCreateInvalidation>>, | ||
pub invalidate_on_file_change: RwLock<HashSet<PathBuf>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DashSet should be fine. This seems unrelated to the cache change. Putting the entire hashset behind a lock will be much slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @devongovett, from what I've measured, this isn't slower ; we won't accept any significant regression.
I'm keen to remove a few of the bits here if possible. DashSet
should be fine if we trust it's implementation. But we need to simplify a few parts to improve reliability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What makes you think this is causing issues? Seems unrelated the potential memory safety problems you mentioned.
Dashmap is a super popular crate. It just shards the map into multiple disjoint maps under individual locks that can be accessed concurrently. I've previously measured improvements on benchmarks by using it. The resolver is quite hot and I did a lot of performance tuning on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to reduce the surface area of what can break.
As a follow-up or in this diff, I can include more benchmarks that try to replicate the performance gains that you've found. We can commit them onto the criterion suite and I can run them on both macOS and Linux/EC2.
Hash(String), | ||
Package(String, String), | ||
Builtin(String), | ||
Url(String), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also seems quite unrelated from the cache changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is trying to simplify the code.
I understand Cow
is here as a performance optimisation, but it doesn't seem like this makes a difference to performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems unrelated to the safety issues though. Small things like not copying specifiers do add up in my experience. Perhaps it doesn't appear to make a difference right now because the rest of your build is a larger portion of the overall time than the resolver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you know +/- what types of benchmarks you've ran? I would be keen to add some benchmarks we can use to measure this.
Note that on the benchmarks I've posted above, there is between 1-10% difference of the overall time. However, since this is quite fast as is I don't that will translate to a meaningful difference in build time. I'm quite confident we can optimise this much more by optimising IO and JSON parsing.
I've added a few notes on the benchmark suite I added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd need to run them again, it was during development several years ago. Just seems strange to remove optimizations that we already implemented that aren't causing any particular problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our intention indeed isn't to remove any optimisations. We are only trying to measure things. I can add DashMap back quite easily ; but I wanted to measure what allocation will do here.
I was asking what you've benchmarked so we can understand and reproduce your findings. Regardless of whether you run them again, we would want to know what types of benchmarks you're running and what the results are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually end-to-end benchmarks like esbuild's one or https://github.com/web-infra-dev/bundler-benchmark
That'd be better, but if I do that can I switch between the two variants at runtime? I think it won't make a big difference as the intention is just to do a safer/measured roll-out, then either revert or roll-forward shortly after. |
Not runtime but compile time, eg in canary or a dev build. Otherwise the binary size will increase due to having two resolver implementations as well. |
@devongovett The reason the crate is copied is that I need to be able to switch at runtime, so that we can roll this out internally measuring the outcomes. Once we do that, we will have solid nºs on what impact this change has. |
Why does it need to be runtime rather than just releasing a different dev/canary version and testing with that? |
Switching at runtime will help us:
|
This is a critical problem for us at the moment, so I'm merging and iterating. |
I think we may need to cut a separate branch for your team to work from so that v2 is always in a stable releasable/auditable state (or as much as possible). It's fine to do experiments or hotfixes in your own projects or in canary/dev builds, but v2 is getting messy and it is blocking stable releases. Seems like you don't really use stable releases anyway given that we haven't done one since February, so if you just had a branch that canary releases were published from and we merged stuff into v2 once it is stable that might be better. |
This PR adds a copy of the resolver package that moves towards owned data structures rather than using arenas and borrowing.
The crate is copied so that it can be feature-flagged.
Benchmarks have been added which demonstrate the performance profile of the resolver, which is mostly IO bound, and demonstrating that it performs only marginally slower than it previously did avoiding copying/allocation (single digit % diff).
Preliminary end-to-end performance testing on a large application shows no difference to overall build time at this point.
The feature-flag is in place to validate the impact on other metrics, including the frequency of crashes we are experiencing.